The replica symmetric behavior of the analogical 

neural network 

Adriano Barra* Giuseppe GenoveseJ Francesco Guerra * 
November 2009 

Abstract 

In this paper we continue our investigation of the analogical neu- 
ral network, paying interest to its replica symmetric behavior in the 
absence of external fields of any type. Bridging the neural network 
to a bipartite spin-glass, we introduce and apply a new interpolation 
scheme to its free energy that naturally extends the interpolation via 
cavity fields or stochastic perturbations to these models. 
As a result we obtain the free energy of the system as a sum rule, 
which, at least at the replica symmetric level, can be solved exactly. 
As a next step we study its related self-consistent equations for the or- 
der parameters and their rescaled fluctuations, found to diverge on the 
same critical line of the standard Amit-Gutfreund-Sompolinsky theory. 

1 Introduction 

The number of disordered models, whose description is reached in the frame 
of statistical mechanics for complex system, increases year by year [5]. As 
a consequence, the need of powerful tools for their analysis raises, which 
ultimately push further the global field of research suggesting new possible 
models where their applicability can be achieved. 

Among these, interestingly, neural networks have never been analyzed from 
an interpolating, stochastic perturbation, perspective [18J. In fact, from the 
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early work by Hopfield [27] and the, nowadays historical, theory of Amit 
Gutfreund and Sompolinsky (AGS) [3], [4j to the modern theory for learn- 
ing [9j d5], about the neural networks (thought of as spin glasses with a 
Hebb-like "synaptic matrix" [24]) very little is rigorously known. 
Surely several contributions appeared (e.g. [Tl [T^ fTTl [T2l [T3l [3TI [32l [33l [81] V 
often following understanding of spin- glasses (e.g. [19j [20j ETJ [30j [35]) and 
the analysis at low level of stored memories has been achieved. 
However in the high level of stored memories, fundamental enquiries are still 
rather obscure. Furthermore general problems as the existence of a well de- 
fined thermodynamic limit, achieved for the spin glass case in [22j[23], are 
unsolved. 

Previously we introduced an "analogical version" of the standard Hopfield 
model, by taking the freedom of allowing the learned patterns to live on the 
real axes, their probability distribution being a standard Gaussian A/[0, 1] 

m 

Within this scenario, we proved the existence of an ergodic phase where 
the explicit expression for all the thermodynamical quantities (free energy, 
entropy, internal energy) have been found to self-average around their an- 
nealed expression in the thermodynamic limit, in complete agreement with 
AGS theory. 

In this paper, again by using an analogy among neural networks and bi- 
partite spin glasses, we move on introducing a novel interpolating technique 
(essentially based on two different stochastic perturbations) which we use to 
give a complete description of the analogical Hopfield model phase diagram 
in the replica symmetric approximation and with high level of stored mem- 
ories (i.e. patterns). 

Furthermore we control the fluctuations and correlations of the order param- 
eters of the theory, whose divergences confirm the transition line predicted 
by standard AGS theory to hold even in this continuous counterpart. 
As a last remark we stress that the whole is exploited without external fields 
checking system responses and, as a consequence, nor retrieval neither the 
presence of any "magnetization" are discussed and are left for future specu- 
lation. 

The paper is organized as follows: In Sec. [2] we introduce the analogical 
neural network with all its statistical mechanics package of definitions and 
properties. In Sec. [3] we analyze its replica symmetric behavior by means of 
our interpolating scheme, while in Sec. H] we exploit the fluctuation control 
to check for regularities and singularities of the order parameters, obtaining 
the critical line for the phase transition from the ergodic regime to a non- 
ergodic one. 
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Sec. [5] is left for conclusion and outlook. 



2 Analogical neural network 

We introduce a large network of N two-state neurons <7j = dbl, i G (1, --,N), 
which are thought of as quiescent (sleeping) when their value is —1 or spiking 
(emitting a current signal to other neurons) when their value is +1. They 
interact throughout a symmetric synaptic matrix Jjj defined accordingly the 
Hebb rule for learning, 



Each random variable £ M = {£f , .., represents a learned pattern and tries 
to bring the overall current in the network (or in some part) stable with re- 
spect to itself (when this happens, we say we have a retrieval state, see e.g. 
p]). The analysis of the network assumes that the system has already stored 
p patterns (no learning is investigated) and we are interested in the case 
in which this number increases proportionally (linearly) to the system size 
(high storage level). 

In standard literature these patters are usually taken at random with distri- 
bution -P(£f) = (1/2)^m +1 + (1/2)(5^m_ 1 , while we extend their support to 
be on the real axes weighted by a Gaussian probability distribution, i.e. 

PiO = -^e-«'') 2 / 2 . (2) 



The Hamiltonian of the model is defined as follows 

k N 

^9 = -jvEE^. (3) 

fi=l i<j 

which, splitting the summations Y2i<j = \ ~ \ ^La enable us to write 
down the following partition function 

a k N R k N 

z N{ p;o = E»p(^EE«f^-^EE(0 2 ) (4) 

cr fi=l ij fi=l i 



Z{(3;0 x E*=iEiIi(C) 2 y 
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f3, the inverse temperature in spin glass theory, denotes the level of noise in 
the network and we defined 

k N 

mo = E ex p(^EE^^)- (5) 

Notice that the last term at the r.h.s. of eq. (|4j does not depend on the 
particular state of the network. 

As a consequence, the control of the last term easily follows: 

a k N b 

lnZjv.fcCflO = hiZ Ntk (P;0 - ^ EE^ = ^Z*MO ~ |/at (6) 

where, as Jn is a sum of independent random variables, E/w = k and 
limjv->oo(l/-^)E/Ar = fc/JVj which in the thermodynamic limit, simply adds 
a term —a/3/2 to the free energy (to be defined in ifTTjl ). 
Consequently we focus just on Z((3; £). Let us apply the Gaussian integration 
|16j to linearize with respect to the bilinear quenched memories carried by 
the £f£^: The expression for the partition function J5)) becomes (renaming 
Z — > Z for simplicity) 

k I k N 

z N ((3; o = e / n d ^ ex p ( v jr E E ^) ' ( ? ) 

cr M = l /i=l 1=1 

with dn(z^) standard Gaussian measure for all the z^. 

Taken O as a generic function of the neurons, we define the Boltzmann state 
u)p(0) at a given level of noise j3 as 



wpip) = u(o) = (z N (8; or 1 e 



PHn (<t;0 



and often we drop the subscript (3 for the sake of simplicity. The s-replicated 
Boltzmann measure is defined as Q = lo 1 x lo 2 x ... x lo s in which all the 
single Boltzmann states are independent states at the same noise level 
and share an identical distribution of quenched memories £. For the sake 
of clearness, given a function F of the neurons of the s replicas and the 
freedom of using the symbol a £ [1, ..,s] to label replicas, such an average 
can be written as 



^( ff \..,, s )) = l^^...^F(a 1 ,.., ff s )exp(-^ff N ( ( T fl ,()). 

(9) 



Z s 

N a 1 a 2 v s 0=1 
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The average over the quenched memories will be denoted by E and for a 
generic function of these memories F(£) can be written as 



nm] = I fill 2 f(o = J F(Odn(o, (io) 

1^=1 i=l 

of course E[£f] = and E[(£f ) 2 ] = 1. 

We use the symbol (.) to mean (.) = E£7(.). 

In the thermodynamic limit, it is assumed 

lim -|r = Q, 

a being a given real number, parameter of the theory. 

For the sake of simplicity we allow a little abuse in the notation so to use the 
symbol a even at finite N, still meaning the ration among the two parties. 
The main quantity of interest is the quenched intensive pressure defined as 

A N (a,0) = -Pf N (a,P) = -^Eln Z N (P;£). (11) 

Here, /at(o, (3) = uat(q, j3) — /3 _1 SAr(a, (3) is the free energy density, UN(a, (3) 
the internal energy density and S]y(a,[3) the intensive entropy 

Reflecting the bipartite nature of the Hopfield model expressed by eq. (J7]) 
we introduce two other order parameters: the first is the overlap between 
the replicated neurons (first party overlap), defined as 

1 - 

to^E^eKH (12) 

i=l 

and the second the overlap between the replicated Gaussian variables z (sec- 
ond party overlap), defined as 



1 k 

Pab = -Y] Z%Z% G (-OO, +00). (13) 



Both the two order parameters above play a considerable role in the theory 
as they can express thermodynamical quantities [?]. 
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3 Replica symmetric free energy 



In this section we pay attention to the structure of the free energy: we 
want to obtain the latter via a sum rule in which we may isolate explicitly 
the order parameter fluctuations so to be able to neglect them achieving a 
replica-symmetric behavior. 

Due to the equivalence among neural network and bipartite spin-glasses, we 
generalize the way cavity field and the stochastic stability techniques work 
on spin glasses to these structures by introducing a new interpolation scheme 
as follows: 

For the sake of clearness, in order to exploit the interpolation method adapted 
to the physics of the model, we introduce 3 free parameters in the interpo- 
lating structure (i.e. a, b, c) that we fix a fortiori, once the sum rule is almost 
achieved. 

In a pure stochastic stability fashion [20], we need to introduce also two 
classes of i.i.d. jV[0, 1] variables, namely N variables r\i and K variables fj^, 
whose average is still encoded into the E operator and by which we define 
the following interpolating quenched pressure A^^((3,t) 

k N k 

A N ,k{M = ^Elog^ f f[dii(zJew(Vi^Y;g<TiZv) (14) 
■ exp(a v / l - t^r? i cj i )exp(6Vl - t r?^) exp(c — - — ^J^)- 

i £t fl 

We stress that t £ [0, 1] interpolates between t = where the interpolating 
quenched pressure becomes made of by non-interacting systems (a series of 
one-body problem) whose integration is straightforward and the opposite 
limit, t = 1, that recovers the correct quenched free energy 
The plan is then to evaluate the t-streaming of such a quantity and than 
obtain the correct expression by using the fundamental theorem of calculus: 

A Ntk (p) =A Ntk (p,t = 1) = A Ntk (P,t = 0)+ f dt'(d t A N , k {(3,t)) .(15) 

jo 7 t=t 

When evaluating the streaming dtA we get the sum of four terms (A, B, C, D): 
each comes as a consequence of the derivation of a corresponding exponential 
term appearing into the expression (fT4l) . 

Once introduced the averages (-)t that naturally extend the Boltzmann mea- 
sure encoded in the interpolating scheme (and reduce to the proper one 
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whenever setting t = 1), we can write them down as 

I jv,fc k 

AT o 



B = ^vvr^1^ EWai) = "T (1 " (gi2)t): 



where in the first three equations we used integration by parts (Wick theo- 
rem). 

In the replica symmetric ansatz, the order parameters do not fluctuate with 
respect to the quenched average and the only values (at any given /3, a point) 
they gets are (q) = q, (p) = p, where the bars denote the replica symmetric 
approximation. 

Summing all the contributions (A, B, C, D) and adding and subtracting the 
term aj3qp/2 (that we use to center and complete the square of the two 
overlaps), we get 

k 

j t = (P-b -cJ^E^w^J - -y<gi2Pi2)t - (16) 

n 

.2 „,i2 



a ab a/3 a/3 

y (i - wi2>t) + — (pi2)* + -^qp - -y?p. 



So we see that if we choose 



a = \J a(3p, b = \ff3q c = (3(1 — q), 

we get 

dA Nik (/3,t) a/3 a/3 

j t = — 2~<(gi2 - q)(Pi2 - P))t - —p(i - q)- (17) 



Once inserted the expression (fT7|) into eq. (|T5l) the sum rule for the free energy 
is achieved. 
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In order to get the replica symmetric solution A^f k (f3) we impose the self- 
averaging of the overlaps, so that we need to evaluate only 

= A N , k (P,t = 0) - 2£p(l - q) - ^, (18) 

where the last term at the r.h.s. comes from the diagonal term of the first 
party as explained in Sec. [2j 

The evaluation of Apf t k(P,t = 0) is easily available because it is a one-body 
calculation, which implies factorization in the volume sizes. Namely, we have 
to evaluate explicitly the quantity 

A N ,k(P,t = 0)= (19) 

k 

= — Elog y^eV^FEf^ + 

a 

+ lElog I fJd^e-5^^(i-/3(i-<?)) e v^E^^M 

= log 2 + j dfi(rj) log cosh (^J aj3prj^ + 

+| log (l - (3(1 - qfj + aElog J dre~ r2/2 e V / ^fc' ?r ) 

where we introduced r = az, a defining the standard Gaussian variance such 
that 

o 2 = (l-P(l-q))-\ (20) 

As a consequence we get 

A Ntk (J3,t = 0) = log2 + yd/x(7 ? )logcosh( v / ^r ? ) + (21) 

a , . 1 . ad q 
+ ol°g(^ 571 + 



2 - - g) 7 2 1 - /3(1 - g)' 

and, overall, we can state the next 

Theorem 1. The replica symmetric free energy of the analogical Hopfield 
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neural network is given by the following expression 



A RS (f3,a) = log 2 + j d/i(r ? )logcosh( v / o^r ? ) + (22) 
a 1 a/3 q a/3 a/3 

Remark 1. We stress that in the ergodic regime, where the overlap self- 
averages to zero, the expression recover the correct ergodic expression as 
well as the annealed expression of the Sherrington- Kirkpatrick model (SK) 
when sending a — > oo and j3 — > by keeping a(5 = Psk- 

Self-consistency relations can be found by imposing equal to zero the 
partial derivatives of the free energy with respect to its order parameters, 
namely the system (d q A((3, a) =0), (dpA(j3, a) =0), which gives 

dfi(z) tank 2 (y/af^pz) - q \ = 0, (24) 



dA 


a/3 


~dq~ 


~Y 


OA 


a/3 


dp 





(23) 



by which 

and as a consequence p(g) = (3a q. These conditions can be seen as a 
minimax principle defining the replica symmetric solution. Let us recall 
that in the spin glass case we have a minimum principle instead [21J. 



4 Fluctuations of the order parameters and critical 
line 

We are now ready to separate different regions in the phase diagram, where 
different behaviors do appear. In particular we want to see where the anneal- 
ing, characterized by (q = 0,p = 0), is spontaneously broken and ergodicity 
is lost. 

To satisfy this task we proceed as follows: at first we introduce the streaming 
equation so to be able to calculate variations of generic observable as overlap 
correlation functions. 

Then we define the centered and rescaled overlaps and introduce their cor- 
relation matrix. Each element of this matrix then is evaluated at t = 



9 



and then propagated thought t = 1 via its streaming: This procedure en- 
codes naturally for a system of coupled linear differential equations that, 
once solved, give the expressions of the overlap fluctuations. The latter are 
found to diverge on a line in the (a, 0) plane, which becomes a natural can- 
didate for a second order phase transition (confirmed by the regularity of 
the behavior before such a line is reached from the ergodic phase). 
Let us start the plan by introducing the following 

Proposition 1. Given O as a smooth function of s replica overlaps {qi, q s ) 
and {pi, ...,p s ), the following streaming equation holds: 

-{0) t = 0V^(J2(° ■ Z*,bVa,b)t (26) 

a,b 

~ s ^2(° • €a,s+lVa,s+l)t + 8 (° ' ^+1,5+2^+1,5+2)*) • 

a=l 

We skip the proof as is long but simple and works by direct evaluation 
pretty standard in the disordered system literature (see for example [211 El 

El). ' 

The rescaled overlap £x2 and 7712 are defined accordingly to 

Z 12 = VN(q 12 -q), (27) 

rn3 = y/K(pn-p). (28) 

In order to control the overlap fluctuations, namely (£f 2 )*=li (^12^12)4=1, 
{Vi2)t=li ■•■•> noting that the streaming equation pastes two replicas to the 
ones already involved (s = 2 so far), we need to study nine correlation 
functions. It is then useful to introduce them and link them to capital 
letters so to simplify their visualization: 



<£l2>t = Mt), (Sl2&3)t = B(t), (Zi2tu) t = C(t), (29) 
{Zi2Vi2)t = D{t), (£127713)* = E{t), (Zi2m)t = F{t), (30) 
(77i 2 T7i2>t = G{t), (wma)t = H(t), (77137734)* = I(t). (31) 

Let us now sketch their streaming. Let us at first introduce the operator 
"dot" as 

/3y/a dt 
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which simplifies calculations and shifts the propagation of the streaming from 
t = 1 to t = (3yfa: Using it we sketch how to write the streaming of the first 
two correlations (as it works in the same way for any other): 



By assuming a Gaussian behavior, as in the strategy outlined in [21], we can 
write the overall streaming of the correlation functions in the form of the 
following differential system 

A = 2AD - 8BE + 6CF, 

B = 2AE + 2BD - ABE - QBF - QEC + 12CF, 
C = 2AF + 2CD + 8BE - 1QBF - 16CE + 20CF, 
D = AG - ABH + 3CI + D 2 - AE 2 + 3F 2 , 

E = AH + BG - 2BH — 3BI — 3CH + QCI + 2ED — 2E 2 — 6EF + 6F 2 , 
F = AI + CG + 4BH-8BI-8CH + WCI + 2DF + 4:E 2 -IQEF + 10F 2 1 
G = 2GB - 8HE + GIF, 

H = 2GE + 2HD - AHE - 6HF — 6IE + 12i\F, 
I = 2GF + 2DI + 8HE — 16HF — IQIE + 20LF. 

It is easy to solve this system, once the initial conditions at t = are known. 
Our general analysis covers also the case where external fields are involved. 
We do not report here the full analysis, for the sake of brevity. 

In order to proceed further, in our case of absence of external fields, we 
need to evaluate these correlations at t = 0. As at t = everything is 
factorized, the only needed check is by the correlations inside each party. 

Starting with the first party, we have to study A, B , C at t = 0. As only 
the diagonal terms give not negligible contribution, it is immediate to work 
out this first set of starting points as 



B 



A 
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N 



A(0) = N- 1 Y,(l-^l^) + f) = l-q 2 , (32) 

i 

N 

B(0) = N - l ^2(afaf-qajaf-qa}af+q 2 )=q-q 2 , (33) 

i 

N,N 

C(0) = N-^(ajafafal-qa}af-q^af + q 2 ) = 

ij 

d^z)^ ^^* )-?, (34) 

where we stress that even in the last equation only the diagonal terms i = j 
contribute. 

For the second party we need to evaluate G, H, I at t = 0. The only difference 
with the first party is the lacking of the dichotomy of its elements such that 
z 2 7^ 1 as for the ex's. 

It is immediate to check that G(0), H(0), 1(0) are function of u)(z 2 ) and 
lo 2 (z), which are Gaussian integrals and can be we worked out as 

r ze v* e §(i-?>V* 2 /2 dz 
u(z) = - a = y/P&no 2 , 35 

r ze vmvzJ(i-<i)z\-zy2 dz 

Remembering that /3a q = p (cfr. eq, ([24l) ). we get 

G(0) = Eu{z 2 )uj{z 2 )-p 2 = Eo\l + ^ar,f -p 2 , 



H(0) = Euj(z 2 )uj(z) 2 -p 2 =Ka 2 (l + v ^r ] a) 2 [3qr] 2 a 4 -p 2 , 
1(0) = Euj a (z) -p 2 =E(/3q) 2 V i a 8 -p 2 . 

The last step missing is averaging over the rj, by exploiting (rj 2 ) = 1, (r/ 4 ) = 3. 
Finally, we have obviously -D(O) = E(0) = F(0) = 0, because at t = the 
two parties are independent. 

Here, we are interested in finding where ergodicity becomes broken (the 
critical line), we start propagating t S — > 1 from the annealed region, 
where q = and p = 0. 

It is immediate to check that, for the only terms that we need to consider, 
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A,D,G (the other being strictly zero on the whole t G [0,1]), the starting 
points are A(Q) = 1, D(0) = 0, G(0) = (1 - (3)~ 2 and their evolution is ruled 
by 

A = 2AD, (37) 
D = AG + D 2 , (38) 
G = 2GD. (39) 

So we need to solve the system above. The first step is noticing that 

d t logA = j = 2D = ^ = d t logG, 

as d(A/G)/dt = 0, and A(0)/G(0) = (1 - /3) 2 , we obtain immediately the 
coupled behavior of the self-correlations: 

A(t) = G(t)(l-(3) 2 . (40) 

We now reduced to consider the system 

D = (1 - 0) 2 G 2 + D 2 , (41) 
G = 2GD. (42) 

Let us call [D + (1 — f3)G] =Y such that summing dHJ and (@2J| we get the 
differential equation 

Y(t) = Y 2 (t) => - y ° 



1-tYb' 

by which, as Yq = (1 — , we get 

D(i = yfap) + (1 - /3)G(t = = z. tzJ—^z, (43) 

1 - [3(1 + v«) 

i.e. there is a regular behavior up to (3 C = 1/(1 + \/a). 

Now, starting from eq. ([43j) . we have to solve separately for G(t) and for 
D{t). 

Let us at first notice that 

G(t) = 2G(t)(Y(t)-(l-P)G(t)), (44) 

by which, dividing both the sides by G 2 and considering Z = G^ 1 , we get 
- Z(t) - 2Y(t)Z(t) + 2(1 -0)=O, (45) 
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namely an ordinary first order differential equation for Z(t). 

We solve it by posing Z(t) = W(t)exp ( - 2 f*Y(t')dt'\ with Z = W 

fixing the auxiliary function W{t) as 



/ Y(t')d£ = log 
Jo 



1-P-t 



We can obtain in a few algebraic steps the function Z(t) and consequently, 
remembering that G(t) = Z~ l (t) we get 



G{t) = w^W) yr^~t + T^jTt) = (i-w-i*- (46) 

Now it is possible to insert eq. (l46l) into ([43]) which concludes the proof of the 
following 

Theorem 2. In the ergodic region the behavior of the overlap fluctuations 
is regular and described by the following equations: 

<{ ' 2) = (i-V-V"' < 47) 
«•**"> - (48) 



The ergodic region ends in the line 



0c = (50) 



which is the critical line. 

We stress that it turns out to be the same AGS-line of the standard neural 
network counterpart. 



5 Conclusion and outlook 

In this paper we achieved another step toward a general theory of neural 
networks whose statistical mechanics is not based on replica-trick. 
We found the replica symmetric behavior of the analogical Hopfield model, 
its self- averaging equations for the order parameters and a complete quanti- 
tative picture of their fluctuations and correlations. The critical line defining 
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ergodicity breaking is found as well, in agreement with the standard AGS 
counterpart. 

Furthermore the method paves the way for analytical investigation of general 
bipartite systems, which are assuming by themselves a very important role 
in applied statistical mechanics [13] . 

Despite these new results, fundamental enquiries are still open: apart the 
challenging thermodynamic limit, the retrieval phase (the response to an ex- 
ternal stimulus) has not been discussed so far, neither the replica symmetry 
breaking scheme, which should be incorporated in the theory too. 
We plan to report soon on these topics. 
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